zero-inflated data
From Noise to Precision: A Diffusion-Driven Approach to Zero-Inflated Precipitation Prediction
Gao, Wentao, Li, Jiuyong, Liu, Lin, Le, Thuc Duy, Chen, Xiongren, Du, Xiaojing, Liu, Jixue, Zhao, Yanchang, Chen, Yun
Zero-inflated data pose significant challenges in precipitation forecasting due to the predominance of zeros with sparse non-zero events. To address this, we propose the Zero Inflation Diffusion Framework (ZIDF), which integrates Gaussian perturbation for smoothing zero-inflated distributions, Transformer-based prediction for capturing temporal patterns, and diffusion-based denoising to restore the original data structure. In our experiments, we use observational precipitation data collected from South Australia along with synthetically generated zero-inflated data. Results show that ZIDF demonstrates significant performance improvements over multiple state-of-the-art precipitation forecasting models, achieving up to 56.7\% reduction in MSE and 21.1\% reduction in MAE relative to the baseline Non-stationary Transformer. These findings highlight ZIDF's ability to robustly handle sparse time series data and suggest its potential generalizability to other domains where zero inflation is a key challenge.
- Oceania > Australia > Australian Capital Territory > Canberra (0.04)
- Oceania > Australia > South Australia > Adelaide (0.04)
- Information Technology > Modeling & Simulation (1.00)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)
Dealing with zero-inflated data: achieving SOTA with a two-fold machine learning approach
Rožanec, Jože M., Petelin, Gašper, Costa, João, Bertalanič, Blaž, Cerar, Gregor, Guček, Marko, Papa, Gregor, Mladenić, Dunja
In many cases, a machine learning model must learn to correctly predict a few data points with particular values of interest in a broader range of data where many target values are zero. Zero-inflated data can be found in diverse scenarios, such as lumpy and intermittent demands, power consumption for home appliances being turned on and off, impurities measurement in distillation processes, and even airport shuttle demand prediction. The presence of zeroes affects the models' learning and may result in poor performance. Furthermore, zeroes also distort the metrics used to compute the model's prediction quality. This paper showcases two real-world use cases (home appliances classification and airport shuttle demand prediction) where a hierarchical model applied in the context of zero-inflated data leads to excellent results. In particular, for home appliances classification, the weighted average of Precision, Recall, F1, and AUC ROC was increased by 27%, 34%, 49%, and 27%, respectively. Furthermore, it is estimated that the proposed approach is also four times more energy efficient than the SOTA approach against which it was compared to. Two-fold models performed best in all cases when predicting airport shuttle demand, and the difference against other models has been proven to be statistically significant.
- Europe > Slovenia (0.04)
- Europe > United Kingdom (0.04)
- Transportation > Ground (0.75)
- Transportation > Infrastructure & Services > Airport (0.74)
Copula-Based Density Estimation Models for Multivariate Zero-Inflated Continuous Data
Zero-inflated continuous data ubiquitously appear in many fields, in which lots of exactly zero-valued data are observed while others distribute continuously. Due to the mixed structure of discreteness and continuity in its distribution, statistical analysis is challenging especially for multivariate case. In this paper, we propose two copula-based density estimation models that can cope with multivariate correlation among zero-inflated continuous variables. In order to overcome the difficulty in the use of copulas due to the tied-data problem in zero-inflated data, we propose a new type of copula, rectified Gaussian copula, and present efficient methods for parameter estimation and likelihood computation. Numerical experiments demonstrates the superiority of our proposals compared to conventional density estimation methods.
- North America > United States > New York (0.04)
- Asia > Taiwan (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > Japan (0.04)